In this project, natural language processing and text mining techniques are applied to explore whether some certain genres of lyrics show up more frequently, what kind of things they conveyed, and what kind of emotions of each genre expressed. Python Natural Language Tookkit is used for data analysis and wordcloud followed with some plots is used for data visualizations.
After cleaning the data, we produced some graphs.
The above figures show the number of songs released from 1970 to 2010 and the proportion that is dedicated to each genre.
As we can see from the graphs, Rock shows up more frequently than any other genres each year and has the most number of songs.
After counting average words of each genre
We find that Hip-Hop contains the most words and most unique words. Rock is the genre with the second largest number of words and unique words, likely due to the large volume of Rock songs in the dataset.
Then, we get the most common words in each genre and produce wordcloud.
Rock
Pop
Metal
Hip-Hop
Country
Jazz
Electronic
R&B
Indie
Folk
We get the the propotion of positive, negative and neutral lyrics by genre and plot the bar chart
From the chart, we can see that the lyrics of Jazz is negaitive.